09/06/2024 - 15/06/2024

12/06/2024 06:38

I'm trying to use PCIe gen 3 by switching the card into our newer machine in the lab. Indeed I can see it with lspci -vv as expected:

05:00.0 Serial controller: Xilinx Corporation Device 7024 (prog-if 01 [16450])
        Subsystem: Xilinx Corporation Device 0007
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 160
        Region 0: Memory at 51100000 (32-bit, non-prefetchable) [size=1M]
        Region 1: Memory at 51200000 (32-bit, non-prefetchable) [size=64K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee006d8  Data: 0000
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s, Exit Latency L0s unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s (ok), Width x4 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range B, TimeoutDis- NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [100 v1] Device Serial Number 00-00-00-00-00-00-00-00
        Kernel driver in use: xdma
        Kernel modules: xdma
05:00.0 Serial controller: Xilinx Corporation Device 7024 (prog-if 01 [16450])
        Subsystem: Xilinx Corporation Device 0007
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 160
        Region 0: Memory at 51100000 (32-bit, non-prefetchable) [size=1M]
        Region 1: Memory at 51200000 (32-bit, non-prefetchable) [size=64K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee006d8  Data: 0000
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s, Exit Latency L0s unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s (ok), Width x4 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range B, TimeoutDis- NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [100 v1] Device Serial Number 00-00-00-00-00-00-00-00
        Kernel driver in use: xdma
        Kernel modules: xdma

I moved the driver files from fe01 to the newer desktop.

When trying to run build-install-driver-linux.sh in /home/pioneer/pcie_testing/XilinxAR65444/Linux it didn't work initially. the make call in the script complains:

function ‘mmiowb’ [-Werror=implicit-function-declaration]
  921 |         mmiowb();
      |         ^~~~~~
cc1: some warnings being treated as errors
function ‘mmiowb’ [-Werror=implicit-function-declaration]
  921 |         mmiowb();
      |         ^~~~~~
cc1: some warnings being treated as errors

I googled the issue and found this forum post:
https://support.xilinx.com/s/question/0D52E00006hpLONSA2/compilation-error-pcie-drivers-for-linux?language=en_US

Which basically just says "comment out all instances of " mmiowb(). There is exactly one in xdma-core.c, so I commented it out. Then things compiled. Loading the the driver and attempting to run some tests seemed to work:

root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests# ./load_driver.sh
xdma                   61440  0
Loading driver...
The Kernel module installed correctly and the xmda devices were recognized.
 DONE
root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests# ./run_test.sh

Info: Number of enabled h2c channels = 2
Info: Number of enabled c2h channels = 2
Info: The PCIe DMA core is memory mapped.
Info: Running PCIe DMA memory mapped write read test
      transfer size:  1024
      transfer count: 1
Info: Writing to h2c channel 0 at address offset 0.
Info: Writing to h2c channel 1 at address offset 1024.
Info: Wait for current transactions to complete.
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000000
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_h2c_0, address = 0x00000000, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x559fc9f48800
CLOCK_MONOTONIC reports 0.000095085 seconds (total) for last transfer of 1024 bytes
Transfer speed: 10.27 MB/s
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_h2c_1, address = 0x00000400, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x55ff1d607800
CLOCK_MONOTONIC reports 0.000060218 seconds (total) for last transfer of 1024 bytes
Transfer speed: 16.22 MB/s
Info: Writing to h2c channel 0 at address offset 2048.
Info: Writing to h2c channel 1 at address offset 3072.
Info: Wait for current transactions to complete.
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000800
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_h2c_0, address = 0x00000800, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x555ae6579800
CLOCK_MONOTONIC reports 0.000055266 seconds (total) for last transfer of 1024 bytes
Transfer speed: 17.67 MB/s
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000c00
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_h2c_1, address = 0x00000c00, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x55790b0e3800
CLOCK_MONOTONIC reports 0.000055462 seconds (total) for last transfer of 1024 bytes
Transfer speed: 17.61 MB/s
Info: Reading from c2h channel 0 at address offset 0.
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000000
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_c2h_0, address = 0x00000000, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x555ba77dc000
Info: Reading from c2h channel 1 at address offset 1024.
Info: Wait for the current transactions to complete.
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_c2h_1, address = 0x00000400, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x559ca193d000
CLOCK_MONOTONIC reports 0.000066850 seconds (total) for last transfer of 1024 bytes
Transfer speed: 14.61 MB/s
CLOCK_MONOTONIC reports 0.000048417 seconds (total) for last transfer of 1024 bytes
Transfer speed: 20.17 MB/s
Info: Reading from c2h channel 0 at address offset 2048.
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000800
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_c2h_0, address = 0x00000800, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x55654b13d000
Info: Reading from c2h channel 1 at address offset 3072.
Info: Wait for the current transactions to complete.
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000c00
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_c2h_1, address = 0x00000c00, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x555b3109f000
CLOCK_MONOTONIC reports 0.000077351 seconds (total) for last transfer of 1024 bytes
Transfer speed: 12.63 MB/s
CLOCK_MONOTONIC reports 0.000049442 seconds (total) for last transfer of 1024 bytes
Transfer speed: 19.75 MB/s
Info: Checking data integrity.
Info: Data check passed for address range 0 - 1024.
Info: Data check passed for address range 1024 - 2048.
Info: Data check passed for address range 2048 - 3072.
Info: Data check passed for address range 3072 - 4096.
Info: All PCIe DMA memory mapped tests passed.
Info: All tests in run_tests.sh passed.
root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests#
root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests# ./load_driver.sh
xdma                   61440  0
Loading driver...
The Kernel module installed correctly and the xmda devices were recognized.
 DONE
root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests# ./run_test.sh

Info: Number of enabled h2c channels = 2
Info: Number of enabled c2h channels = 2
Info: The PCIe DMA core is memory mapped.
Info: Running PCIe DMA memory mapped write read test
      transfer size:  1024
      transfer count: 1
Info: Writing to h2c channel 0 at address offset 0.
Info: Writing to h2c channel 1 at address offset 1024.
Info: Wait for current transactions to complete.
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000000
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_h2c_0, address = 0x00000000, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x559fc9f48800
CLOCK_MONOTONIC reports 0.000095085 seconds (total) for last transfer of 1024 bytes
Transfer speed: 10.27 MB/s
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_h2c_1, address = 0x00000400, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x55ff1d607800
CLOCK_MONOTONIC reports 0.000060218 seconds (total) for last transfer of 1024 bytes
Transfer speed: 16.22 MB/s
Info: Writing to h2c channel 0 at address offset 2048.
Info: Writing to h2c channel 1 at address offset 3072.
Info: Wait for current transactions to complete.
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000800
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_h2c_0, address = 0x00000800, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x555ae6579800
CLOCK_MONOTONIC reports 0.000055266 seconds (total) for last transfer of 1024 bytes
Transfer speed: 17.67 MB/s
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000c00
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_h2c_1, address = 0x00000c00, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x55790b0e3800
CLOCK_MONOTONIC reports 0.000055462 seconds (total) for last transfer of 1024 bytes
Transfer speed: 17.61 MB/s
Info: Reading from c2h channel 0 at address offset 0.
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000000
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_c2h_0, address = 0x00000000, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x555ba77dc000
Info: Reading from c2h channel 1 at address offset 1024.
Info: Wait for the current transactions to complete.
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_c2h_1, address = 0x00000400, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x559ca193d000
CLOCK_MONOTONIC reports 0.000066850 seconds (total) for last transfer of 1024 bytes
Transfer speed: 14.61 MB/s
CLOCK_MONOTONIC reports 0.000048417 seconds (total) for last transfer of 1024 bytes
Transfer speed: 20.17 MB/s
Info: Reading from c2h channel 0 at address offset 2048.
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000800
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_c2h_0, address = 0x00000800, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x55654b13d000
Info: Reading from c2h channel 1 at address offset 3072.
Info: Wait for the current transactions to complete.
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000c00
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_c2h_1, address = 0x00000c00, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x555b3109f000
CLOCK_MONOTONIC reports 0.000077351 seconds (total) for last transfer of 1024 bytes
Transfer speed: 12.63 MB/s
CLOCK_MONOTONIC reports 0.000049442 seconds (total) for last transfer of 1024 bytes
Transfer speed: 19.75 MB/s
Info: Checking data integrity.
Info: Data check passed for address range 0 - 1024.
Info: Data check passed for address range 1024 - 2048.
Info: Data check passed for address range 2048 - 3072.
Info: Data check passed for address range 3072 - 4096.
Info: All PCIe DMA memory mapped tests passed.
Info: All tests in run_tests.sh passed.
root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests#

12/06/2024 06:46

Trying to run the speed tests

root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests# ./dma_from_device -d /dev/xdma0_c2h_0 -f data/datafile_32M.bin -s 33554432
sscanf() = 1, value = 0x02000000
device = /dev/xdma0_c2h_0, address = 0x00000000, size = 0x02000000, offset = 0x00000000, count = 1
host memory buffer = 0x7ff3074ed000
CLOCK_MONOTONIC reports 0.041413173 seconds (total) for last transfer of 33554432 bytes
Transfer speed: 772.70 MB/s
root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests# ./dma_to_devi
ce -d /dev/xdma0_h2c_0 -f data/datafile_32M.bin -s 33554432
sscanf() = 1, value = 0x02000000
device = /dev/xdma0_h2c_0, address = 0x00000000, size = 0x02000000, offset = 0x00000000, count = 1
host memory buffer = 0x7f5e9806d400
CLOCK_MONOTONIC reports 0.035558064 seconds (total) for last transfer of 33554432 bytes
Transfer speed: 899.94 MB/s
root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests#
root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests# ./dma_from_device -d /dev/xdma0_c2h_0 -f data/datafile_32M.bin -s 33554432
sscanf() = 1, value = 0x02000000
device = /dev/xdma0_c2h_0, address = 0x00000000, size = 0x02000000, offset = 0x00000000, count = 1
host memory buffer = 0x7ff3074ed000
CLOCK_MONOTONIC reports 0.041413173 seconds (total) for last transfer of 33554432 bytes
Transfer speed: 772.70 MB/s
root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests# ./dma_to_devi
ce -d /dev/xdma0_h2c_0 -f data/datafile_32M.bin -s 33554432
sscanf() = 1, value = 0x02000000
device = /dev/xdma0_h2c_0, address = 0x00000000, size = 0x02000000, offset = 0x00000000, count = 1
host memory buffer = 0x7f5e9806d400
CLOCK_MONOTONIC reports 0.035558064 seconds (total) for last transfer of 33554432 bytes
Transfer speed: 899.94 MB/s
root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests#

Shows that putting the card in the PCIe3.0 slot does not speed it up. This is expected becuase somehow Vivado is hardcoding a limit of 5.0 GT/s.


12/06/2024 07:10

Upon further investigation, this limit is somehow specified by the board files. I opened another board file (Versal VCK190 Evaluation Platform) and created an XDMA IP block. This had the option for transfer speeds up to PCIE gen 4 (16 GT/s). It seems the speed we see is a limitation of the card.

Maybe we can try to push the HTG-K700 to see if we can push that further.